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We define block-tree graphs as a tree-structured graph where each node is a cluster of nodes such that 
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I. Introduction 

A graphical model is a random vector defined on a graph such that each node represents a random 
variable (or multiple random variables), and edges in the graph represent conditional independencies. 
The underlying graph structure in a graphical model leads to a factorization of the joint probability 
distribution. This property has lead to graphical models being used in many applications such as sensor 
networks, image processing, computer vision, bioinformatics, speech processing, and ecology |[T1, 121, to 
name a few. This paper derives efficient algorithms on graphical models. The structure of the graph plays 
an important role in determining the complexity of these algorithms. Tree-structured graphs are suitable 
for deriving efficient inference and estimation algorithms IS. Inference in graphical models corresponds 
to finding marginal distributions given a joint probability distribution. Estimation of graphical models 
corresponds to performing inference over the conditional distribution p(x|y), where x is a random vector 
defined on a graph with noisy observations y. State-space models can be interpreted as graphical models 
defined on a chain or a tree H, Q, for which efficient estimation algorithms include the Kalman filter 
or recursive smoothers iH. Estimation and inference in arbitrary chain or tree structured graphical 
models is achieved via belief propagation 131. These graphical models, however, have limited modeling 
capability ||8l, and it is thus desirable to consider more general graphs, i.e., graphs with cycles, an example 



of which is shown in Fig. 1(a) 



A popular method for inference in graphs with cycles is to perform variable elimination, where the joint 
probability distribution is marginalized according to a chosen elimination order, which is a permutation 
of the nodes in the graph. Frameworks for variable elimination have been proposed in ll9l- lim . A 
general framework for variable elimination is achieved by constructing a junction-tree |[T2l . which is 
a tree-structured graph with edges between clusters of nodes. The key properties of junction-trees are 
highlighted as follows: 

(i) Clusters in a junction-tree: Two clusters connected by an edge in a junction-tree always have 
at least one common node. The number of nodes in the cluster with maximum size minus one is 
called the width of a graph, denoted as w(G) for a graph G. 
[a) Constructing junction-trees: This consists of two steps: triangulation, which has complexity 0{n), 
and a maximum spanning tree algorithm, which has complexity 0{m^), where m is the number 
of cliques (see Section Hl-AI) in a triangulateqj graph |13]. The number of cliques m depends on 
the connectivity of the graph: if a graph is dense (many edges), m can be small and if a graph is 

'a graph is triangulated if all cycles of length four or more have an edge connecting non-adjacent nodes in the cycle 



sparse (small number of edges), m can be as large as n — 1. 
{in) Optimal junction-trees: For a graph G, there can be many different associated junction-trees. An 
optimal junction-tree is the one with minimal width, called the treewidth of the graph | [T4l . denoted 
as tw(G). Finding the optimal junction-tree, and thus the treewidth of a graph, requires a search 
over at most n! number of possible combinations, where n is the number of nodes in a graph. 
{iv) Complexity of inference: Inference in graphical models using junction-trees can be done using 
algorithms proposed in fill . ifTSl . |[T6l. The complexity of inference using junction-trees is expo- 
nential in the treewidth of the graph ||T6l . 
From the above analysis, it is clear that constructing junction-trees can be computationally difficult. 
Further, finding optimal junction-trees is hard because of the large search space. Since finding optimal 
junction-trees is hard, finding the treewidth of a graph is also hard fTTl. Thus, the complexity of inference 
using junction-trees really depends on the upper bound on treewidth computed using heuristic algorithms, 
such as those given in [18]. 

In this paper, we introduce block-tree graphs, as an alternative framework for constructing tree- 
structured graphs from graphs with cycles. The key difference between block-trees and junction-trees 
is that the clusters in a block-tree graph are disjoint, whereas clusters in a junction-tree have common 
nodes. We use the term block-tree because the adjacency matrix for block-tree graphs is block-structured 
under a suitable permutation of the nodes. The key properties of block-tree graphs and its comparison to 
the junction-tree are outlined as follows: 

{i') Clusters in a block-tree: Clusters in a block-tree graph are always disjoint. We call the number 

of nodes in the cluster with maximum size the block- width, denoted as bw(G) for a graph G. 
{a') Constructing block-trees: We show that a graph can be transformed into a block-tree graph by 
appropriately clustering nodes of the original graph. The algorithm we propose for constructing 
block-tree graphs involves choosing a root cluster and finding successive neighbors. An important 
property is that a block-tree graph is uniquely specified by the choice of the root cluster. Thus, 
constructing block-tree graphs only requires knowledge of a root cluster, which is a small fraction 
of the total number of nodes in the graph. On the other hand, constructing junction-trees requires 
knowledge of an elimination order, which is a permutation of all the nodes in the graph. Con- 
structing a block-tree graph has complexity 0{n), where n is the number of nodes in the graphs. 
When compared to junction-trees, we avoid the 0{m?) computational step, which is significant 
savings when m is as large as n. 



{in') Optimal block-trees: Different choices of root clusters result in different block-tree graphs. We 
define an optimal block-tree graph as the block-tree graph with minimal block-width, which we call 
the block-treewidth of a graph, denoted as btw(G). We show that computing the optimal block-tree, 
and thus the block-treewidth, requires a search over (r^^i) possible number of choices. Although 
possibly very large for large n, this number is much less than n!, the search space for computing 
optimal junction-trees. 
{iv') Complexity of inference: We show that the complexity of using block-tree graphs for inference 
over graphical models is exponential in the maximum sum of cluster sizes of adjacent clusters. 
From (i') — {Hi'), we see that constructing block-tree graphs is faster and finding optimal block-tree 
graphs has a smaller search space. In general, the complexity of inference using block-tree graphs is 
higher, however, we show that there do exist graphical models for which the complexity of inference is 
the same for both the junction-tree and the block-tree graph. 

Using disjoint clusters to derive efficient algorithms on graphical models has been considered in the 
past, but only in the context of specific graphical models. For example, |19| and f20'| derive recursive 
estimators for graphical models defined on a 2-D lattice by scanning the lattice horizontally (or vertically). 
For specific directed graphs, the authors in |[2ll and |[22l . used specific disjoint clusters for inference. 
To our knowledge, previous work has not addressed questions like optimality of different structures or 
proposed algorithms for constructing tree-structured graphs using disjoint clusters. Our block-tree graphs 
address these questions for any given graph, even non-lattice graphs and arbitrary directed graphs. 

Applying our block-tree graph framework to undirected graphical models with boundary conditions, 
such that the boundary nodes connect to the undirected components in a directed manner, we convert 
a boundary valued problem into an initial value problem. Motivation for using such graphs, which are 
referred to as chain graphs in the literature |[23l - |[251 . is in accurately modeling physical phenomena 
whose underlying dynamics are governed by partial differential equations with local conditions imposed 
on the boundaries. To not confuse chain structured graphs with chain graphs, in this paper we refer to 
chain graphs as boundary valued graphs. Such graphical models have been used extensively in the past to 
model images with boundary conditions being either Dirichlet, Neumann, or periodic, see |[20l . ll26l - |[29l 
for examples. To enable recursive processing, past work has either ignored the effect of boundaries or 
assumed simpler boundary values. Using our block-tree graph framework, we cluster all boundary nodes 
in the chain graph into one cluster and then build the block-tree graph. In |i30l, we derived recursive 
representations, which we called a telescoping representation, for random fields over continuous indices 
and random fields over lattices with boundary conditions. The results presented here extend the telescoping 



representations to arbitrary boundary valued graphs, not necessarily restricted to boundary valued graphs 
over 2-D lattices. Applying our block-tree graph framework to Gaussian graphical models, we get linear 
state-space representations, which leads to recursive estimation equations like the Kalman filter Q or 
the Rauch-Tung-Striebel lISTTl smoother. 

As mentioned earlier, the complexity of inference in graphical models is exponential in the treewidth 
of the graph. Thus, inference in graphical models is computationally intractable when the treewidth is 
large 11321 . For this reason, there is interest in efficient approximate inference algorithms. Loopy belief 
propagation (LBP), where we ignore the cycles in a graph and apply beUef propagation, is a popular 
approach to approximate inference 13]. Although LBP works well in several graphs, convergence of LBP 
is not guaranteed, or the convergence rate may be slow lH, |[33l . Another class of algorithms is based 
on decomposing a graph into several computationally tractable subgraphs and using the estimates on the 
subgraphs to compute the final estimate ISl, |[34l . ||35l. We show how block-tree graphs can be used to 
derive efficient algorithms for estimation in graphical models. The key step is in using the block-tree graph 
to find subgraphs, which we call spanning block-trees. We apply the spanning block-tree framework to the 
problem of estimation in Gaussian graphical models and show the improved performance over spanning 
trees. 

Organization: Section |ll]reviews graphical models, inference algorithms for graphical models, and the 
junction-tree algorithm. Section JII] introduces block-tree graphs, outlines an algorithm for constructing 
block-tree graphs given an arbitrary undirected graph, and introduces optimal block-tree graphs. Section 
|IV] outlines an algorithm for inference over block-tree graphs and discusses the computational complexity 
of such algorithms. Section IVl considers the special case of boundary valued graphs. Section IVTl considers 
the special case of Gaussian graphical models and derives linear recursive state-space representations, 
using which we outline an algorithm for recursive estimation in graphical models. Section IVlIl considers 
the problem of approximate estimation of Gaussian graphical models by computing spanning block-trees. 
Section I Villi summarizes the paper. 

IL Background and Preliminaries 

Section III- A I reviews graphical models. For a more complete study, we refer to |[36l . Section III-BI 
reviews inference algorithms for graphical models. 



A. Review of Graphical Models 

Let X = {xs € M"^ : s E V} be a random vector defined on a graph G = {V,E), where V = 
{1,2,..., n} is the set of nodes and E C V x V is the set of edges. Given any subset W C V, let 
xw = {xs '■ s G W} denote the set of random variables on W. An edge between two nodes s and t 
can either be directed, which refers to an edge from node s to node t, or undirected, where the ordering 
does not matter, i.e., both (s,t) and {t,s) belong to the edge set E. One way of representing the edge 
set is via an n x n adjacency matrix A such that A{i,j) = 1 if {i,j) G E, A{i,j) = if {i,j) ^ E , 
where we assume A{i,i) = 1 for all i = 1, . . . ,n. A path is a sequence of nodes such that there is 
either an undirected or directed edge between any two consecutive nodes in the path. A graph with only 
directed edges is called a directed graph. A directed graph with no cycles, i.e., there is no path with 
the same start and end node, is called a directed acyclic graph (DAG). A graph with only undirected 
edges is called an undirected graph. Since DAGs can be converted to undirected graphs via moralization, 
see 1361, in this paper, unless mentioned otherwise, we only study undirected graphs. For any s G V, 
Af{s) = {t £ V : {s,t) G E} defines the neighborhood of s in the undirected graph G = {V,E). The 
degree of a node s, denoted d{s), is the number of neighbors of s. A set of nodes C in an undirected 
graph is a clique if all the nodes in C are connected to each other, i.e., all nodes in C have an undirected 
edge. A random vector x defined on an undirected graph G is refened to as an undirected graphical 
model or a Markov random field. The edges in an undirected graph are used to specify a set of conditional 
independencies in the random vector x. For any disjoint subsets A, B, C of V, we say that B separates 
A and G if all the paths between A and G pass through B. For undirected graphical models, the global 
Markov property is defined as follows: 

Definition 1 (Global Markov Property): For subsets A, B,C oiV such that B separates A and C, xa 
is conditionally independent of xc given xb, i.e., xa ^ xc\xb- 

By the Hammersley-Clifford theorem, the probability distribution p(x) of Markov models is factored 
in terms of cliques as ITTll 

p(x) = -^Yl '^c{xc) , (1) 

cec 

where {ipc{xc)}c&c are positive potential functions, also known as clique functions, that depend only 
on the variables in the clique G £ C, and Z, the partition function, is a normalization constant. 

Throughout the paper, we assume that a given graph is connected, which means that there exists a 
path between any two nodes of the graph. If this condition does not hold, we can always split the graph 
into more than one connected graph and separately study each connected graph. A subgraph of a graph 
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(a) Undirected graph (b) Junction tree for (a). 

Fig. 1. Undirected graphs and their junction-trees. 





(c) Undirected graph (d) Junction tree for (c). 



G = {V, E) is graph with vertices and edges being a subset of V and E, respectively. In the next Section, 
we review algorithms for doing inference in graphical models. 



B. Inference Algorithms 

Inference in graphical models corresponds to finding marginal distributions, say p{xs), given the joint 
probability distribution p(x) for x = {xi, . . . ,x„}. All inference algorithms derived on p(x) can be 
applied to the problem of estimation, where we want to marginalize the joint distribution p(x|y) to find 
p(xs|y), where y is a noisy observation of the random vector x. 

For tree-structured graphs, belief propagation IS is an efficient algorithm for inference with complexity 
linear in the number of nodes. For graphs with cycles, as discussed in Section I, a popular method is 
to first construct a junction-tree and then apply belief propagation |[T2ll . We now consider two examples 
that will act as running examples throughout the paper. 

Example 1: Consider the undirected graph in Fig. |l(a) and its junction-tree shown in Fig. |l(b)| The 
clusters in the junction-tree are represented as ellipses (these are the cliques in the triangulated graph 
producing the junction-tree). On the edges connecting clusters, we have separator nodes that correspond 
to the common nodes connecting two clusters. It can be shown that this junction-tree is optimal, and thus 



the treewidth of the graph in Fig. 1(a) is three. 



Example 2: By deleting the edge between nodes 3 and 5 in Fig. |l(a)[ we get the undirected graph in 
Fig. 1(c) The optimal junction tree is shown in Fig. |l(d)| (the separator nodes are ignored for simpUcity). 
The treewidth of the graph is two. 

To do inference using junction-trees, we first associate potential functions with each clique. This is 
done by grouping potentials from the original joint distribution and mapping them to their respective 
cliques. For example, in Example \T\ the junction tree has a clique {1, 2, 3}, so the clique function will 
be ^1,2,3 = 'ipi,2{xi,X2)ipi,3{xi,X3) , whcrc ipi^2ixi,X2) and ipi,3{xi,xs) are factors in the original 



probability distribution. Having defined potential functions for each clique, a message passing scheme, 
similar in spirit to belief propagation, can be formulated to compute marginal distributions for each clique 
|[T2l . |fT6l . The marginal distribution of each node can be subsequently computed by marginalizing the 
distribution of the cliques. The following theorem summarizes the time and space complexity of doing 
inference on junction trees. 

Theorem 1 (Complexity of inference using junction tree /f76l/).- For a random vector x E K" defined 
on an undirected graph G = {V, E) with each Xg taking values in il, the time complexity for doing 
inference is exponential in the treewidth of the graph and the space complexity of doing inference is 
exponential in the treewidth of the graph plus one. 

Theorem [T] corresponds to the complexity of doing inference on an optimal junction tree. However, 
finding the optimal junction-tree is hard in general, and thus the complexity is estimated by the upper 
bound of the treewidth of the graph, which can be found using algorithms in fill. The next Section 
introduces block-tree graphs as an alternative tree decomposition for graphs with cycles and shows 
that constructing block-tree graphs is less computationally intensive than constructing junction-trees and 
finding optimal block-trees has a smaller search space than finding optimal junction-trees. 

HI. Block-Tree Graph 

In this section, we introduce block-tree graphs and show the merits of using block-tree graphs over 
junction-trees. Section IIII-AI defines a block-tree graph and gives examples. Section IIII-BI shows how to 
construct block-tree graphs starting from a connected undirected graph. Section IIII-CI introduces optimal 
block- tree graphs. 

A. Definition and Examples 

To define a block-tree graph, we first introduce block-graphs, which generalize the notion of graphs. 
Throughout this paper, we denote block-graphs by Q and graphs by G. 

Definition 2 (Block- graph): A block-graph is the tuple Q = {€,£), where C = {Ci,C2, . . . ,Q} is a 
set of disjoint clusters and f is a set of edges such that {i,j) € £ if there exists an edge between the 
clusters Cj and Cj. 

Let the cardinality of each cluster be 7^ = \Gk\, and let n be the total number of nodes. If 7^ = 1 
for all k, Q reduces to an undirected graph. For every block-graph Q, we associate a graph G = {V,E), 
where V is the set of all nodes in the graph and E is the set of edges between nodes of the graph. For 







(a) Block-tree (b) An undirected graph for (a) (c) Junction tree for Fig. |2(b)| (d) Block-tree for Fig. |l(c)| 

Fig. 2. Example of block-trees 



each {i,j) S £, the set of edges E will contain at least one edge connecting two nodes in Cj and Cj or 
connecting nodes within Cj or Cj. A complete block-graph is defined as follows. 

Definition 3 (Complete block-graph): For a block-graph Q = {€,£), if all the nodes in Cj have an 
edge between them, and for all {i,j) G E if all the nodes in Cj and all the nodes in Cj have an edge 
between them, then ^ is a complete block graph. 
We now introduce block- tree graphs. 

Definition 4 (Block-tree graph): A block-graph Q = (C, £) is called a block-tree graph if there exists 
only one path connecting any two clusters Cj and Cj. 

Thus, block-tree graphs generalize tree-structured graphs. Using Definition 3, we can define a complete 
block-tree graph. As an example, consider the block-tree graph shown in Fig. |2(a)[ where Ci = {1}, 
C2 = {2, 3}, C3 = {4, 5, 6}, C4 = {7, 8}, and C5 = {9}. A complete block-tree graph corresponding to 
the block-tree in Fig. |2(a)| is shown in Fig. |2(b)| The block-tree graph in Fig. 2(a) 



serves as a representation 



for a family of undirected graphs. For example. Fig. |2(a)| serves as a representation for the graphs in 
Fig. |l(a)| and Fig. |l(c) This can be seen by removing edges from Fig. 2(b)[ In the next Section, we 
consider the problem of constructing a block-tree graph given an undirected graph. 



B. Constructing Block-Tree Graphs 

Our algorithm for constructing block-tree graphs is outlined in Algorithm [T] The input to the algorithm 
is a connected graph G and an initial cluster Vi, which we call the root cluster. The output of the algorithm 
is a block-tree graph Q = {C,£). The key steps of the algorithm are highlighted as follows: 
Forward Pass: (Lines 3-7) Starting from the root cluster Vi, we iteratively find successive neighbors of 
Vi to construct a sequence of r clusters Vx.V^, ■ ■ ■ ,Vr such that V2 = J\f {Vi)\Vi, V3 = A/'(V2)\{Vi U V2}, 
. . . , Vr = M{Vr-i)\{Vr-2 U Vr-i}, whcrc J\f{Vk) are the neighbors of the set of nodes in V^. This is 
shown in Line 5 of Algorithm [T] For each Vk, k = 2, . . . ,r, we split Vk into rrik disjoint clusters 
{Vf}^, . . . , V^''} such that [jVj! = V^ and there are no edges between the clusters VI and ¥( for i 7^ j. 
This is shown in Line 6 of Algorithm [1] 
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Algorithm 1 Constructing Block-tree Graphs 



procedure ConstructBlockTree(G, Vi) 



r = 1 ; Fo = { }; 
while [jVr^V do 
4: r = r + 1 

5: Find neighbors: Vr = {k : (j, k) e E\f ,j e T4_i}\{yr-2 U y,._i} 

6: Split Cluster: {V^^ ..., F™-} s.t. for all i G V;"' and j G V;"^ ni / na, (i,i) ^ E. 

end while 

for i = r,r — 1, ... ,3 do 
9: for j = 1, ... ,mi do 

10: Update cluster: Find {V^'^^i, • • • , V^;'!"i} s.t. there exists nodes si, . . . , s^, where Sk G V^;!!i. 

s.t. (sfc, t) G -E for some t e V^ . Combine {V/^^^, . . . , V^j'!^!} into one cluster and update Vi-i. 
11: end for 

12: end for 
13: Relabel clusters as Ci, . . . , C/ and find edge set £ s.t. {i,j) G iS if there exists an edge between 

Ci and Cj. 
14: end procedure 



Backwards Pass: (Lines 8-13) In this step, we find the final clusters of nodes given Vi, V2, . . . ,Vr. 
Starting atVr = {V^, . . . , l^™"}, for each Vr , j = 1, . . . , rrir, we find all clusters {Vr-i, ■ ■ ■ , Vr-i) ^^^^ 
that there exists an edge between V^^-^, n = 1, . . . ,w and Vr . Combine {V^-I^i, • • • , Vr-i} ^^^^ ^^^ cluster 
and then update the clusters in V^-i accordingly. We repeat the above steps for all clusters Vr-i, . . . , V3. 
Thus, if r = 2, the backwards step is not needed. Relabel all the clusters such that C = {Ci, . . . ,C/} 
and find the edge set £. 

The forward pass of the algorithm first finds a chain structured block-graph over the clusters Vi,V2,. ■ ■ , 
Vr. The backwards pass then splits the clusters in Vk to get a tree-structured graph. The key intuition 
utilized in the backwards pass is that each cluster in Vk connects to only one cluster in Vk-i- If there are 
more than one such clusters in Vfc_i, it is trivial to see that the resultant block-graph will have a cycle 
and will no longer be a block- tree graph. 

As an example, consider finding block-tree graphs for the graph in Fig. |l(a)| Starting with the root 
cluster Vi = {1}, we have V2 = {2,3}, V3 = {4,5,6}, V4 = {7,8}, and V5 = {9}. Further spUtting the 
clusters Vk and running the backwards pass, the clusters do not split, and we get the block-tree graph in 



Fig. 2(a) We get the same block-tree graph if we start from the root cluster Vi = {9}. Now suppose, 
we start with the root cluster Vi = {2,3}. Then, we have V2 = {1,4,5,6}, V3 = {7,8}, and V4, = {9}. 
SpUtting these clusters (Line 6 in Algorithm 1), we have V2 = {!}, V2 = {4,6}, V2 = {5}, V^ = {7}, 
V^ = {8}, and V^ = {4}. Given these clusters, we now apply the backwards pass to find the final set 
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of clusters: 

1) The cluster V^ = {4} has edges in both V^ = {7} and V^ = {8}, so we combine V^ and V^ to 
get 1/3! = {7, 8}. 

2) Next, the cluster Vg^ = {7, 8} has edges in V2 and V2, so we combine these clusters and get 
y2' = {l} and If ^{4,5,6}. 

Given the above clusters, we get the same block-tree graph in Fig. |2(a)| The need of the backwards pass 
in Algorithm [T] is clear from the above example since it successfully splits the cluster V2 with four nodes 



into two smaller clusters. As another example, the block-tree graph for the graph in Fig. |l(c)| using a 
root cluster of Vi = {1} is shown in Fig. |2(d)| 

Notice that in Algorithm [T] we did not split the root cluster Vi. Thus, one of the clusters in C will be 
Vi. Without loss in generality, we assume Ci = Vi. We now show that Algorithm [T] always gives us a 
unique block-tree graph for each set of root cluster Vi . 

Theorem 2: Algorithm [1] always outputs a block-tree graph Q = (C, £) for each possible set of root 
cluster Vi and undirected graph G = {V,E), which is connected. Further, the block-tree graph Q is 
unique. 

Proof: For the root cluster Vi, after the backwards pass of the algorithm, we have the set of clusters: 
Fi, {V.];, V^, ..., V^''}, ..., {y^ V^, ..., V^^}. By construction, there are no edges between V^"^ and 
V^'^ for ni 7^ n2. The total number of clusters is / = 1 + Ylk=2 ^'k- For Q = (C, £) to be a block-tree 
graph, the undirected graph G = ({1,2, . . . ,l},£) to be a tree-structured graph. For this, G must be 
connected and the number of edges in the graph must he. \£\ = I — 1. The block-tree graph formed using 
Algorithm [T] is connected by construction since the original undirected graph G = {V, E) is connected. 
Counting the number of edges between clusters, we have / — 1 edges, and thus the output of Algorithm 
[1] is a block-tree graph. The uniqueness of the block-tree graph follows from construction. ■ 

The next theorem characterizes the complexity of constructing block-tree graphs. 

Theorem 3 (Complexity of Algorithm\r^: The complexity of constructing a block-tree graph is 0{n), 
where n is the number of nodes in the graph. 

Proof: The proof is trivial since the algorithm involves traversing the nodes of the graph. We do 
this twice, once during the forward pass and once during the backwards pass. ■ 

Comparison to junction-trees: As mentioned before, the key difference between block-trees and junction- 
trees is that block-trees are constructed using disjoint clusters, whereas clusters in a junction-tree have 
common nodes. Constructing block-trees is computationally more efficient since constructing junction- 
trees requires an additional complexity of 0{m?), where m can be as large as n for sparse graphs. From 
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Algorithm [T] and Theorem |2j we note that a block-tree graph is uniquely specified using a root cluster, 
which is a small number of nodes. On the other hand, specifying a junction-tree requires an elimination 
order, the size of which can be as largqj as n. 

C. Optimal Block-Tree Graphs 

In this Section, we consider the problem of finding optimal block-tree graphs. In order to define an 
optimal block-tree graph, we first introduce the notion of block-width and block-treewidth. 

Definition 5 (Block-width): For an undirected graph G = {V,E), the block-width of the graph with 
respect to a root cluster Vi, bw(G, Vi), is the maximum cluster size in the block-tree graph constructed 
using Vi as the root cluster: bw(G, Vi) = max^ |7fc| . 

Definition 6 (Block-treewidth): The block-treewidth, btw(G), of an undirected graph G = {V,E) is 
the minimal block-width of a graph G with respect to all root clusters: btw(G) = minv^cv bw(G, Vi) . 



For example, bw (G, {!}) = 3 for the graph in Fig. 1(a) By checking over all possible root clusters, it 
is easy to see that the block-treewidth for the graph in Fig. |l(a)| is also three. For Fig. |l(c)[ bw(G, {!}) = 2 
which is also the block-treewidth of the graph. We can now define an optimal block-tree graph. 

Definition 7 (Optimal block-tree graph): A block-tree graph for an undirected graph G = {V, E) with 
respect to a root cluster Vi is optimal if the block-width with respect to Vi is equal to the block-treewidth 
of the graph, i.e., bw(G, Vi) = btw(G). 

We show in Section IVII-AI that the notion of optimality for block-tree graphs in Definition |7] is useful 
when finding spanning block-trees, which are subgraphs with lower block-treewidth. Computing the 
block-treewidth of a graph requires a search over all possible root clusters, which has complexity of 
0(2"). This search space can be simplified since if we choose a Vi such that |Vi[ > \n/2\, the block- 
width of the graph will be Vi itself. Thus, the search space can be restricted to root clusters of length 
\n/2\ , which requires a search over („/2) number of possible clusters. In comparison, computing the 
treewidth requires finding an optimal elimination order, which requires a search over n! possible number 
of combinations. Since n! » (rn/21)' ^^^ search space of computing the treewidth is much larger than the 
search space of computing the block-treewidth. However, the problem of computing the block-treewidth 
is still computationally intractable as the search space grows exponentially as n increases. 

We now propose a simple heuristic to find an upper bound on the block-treewidth of a graph. Let 
G = {V, E) be an undirected graph with n nodes. Instead of searching over all possible root clusters, 

^The exact size of the elimination order depends on the connectivity of the graph 
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TABLE I 

Upper bound on block-treewidth vs upper bound on treewidth 



Graph 


Treewidth 


Block-treewidth 


nodes 


edges 


ship-ship-pp 


8 


8 


30 


77 


water 


10 


8 


32 


123 


fungiuk 


4 


4 


15 


36 


pathfinder-pp 


7 


6 


12 


43 


lb67 


17 


16 


68 


559 


Ibbz 


28 


23 


57 


543 


Ibkb 


34 


29 


131 


1485 


Ibkf 


39 


37 


106 


1264 


lbx7 


11 


11 


41 


195 


len2 


17 


16 


69 


463 


lon2 


40 


34 


135 


1527 


n X n grid graph 


n 


n 


n2 


2n(n - 1) 



whose maximum size can be [n/2] , we restrict the search space to smaller root clusters. For n small, we 
find the best root cluster of size two, and for n large we find the best root cluster of size one. Given the 
initial choice of the root cluster, we add nodes to this to see if the block-width can be lowered further. 
For example, if the initial root cluster is Vi, we check over all k € V\Vi and see if {Vi, k} leads to a 
lower block-width. We repeat this process until the block-width does not decrease further. For small n, 
the complexity of this heuristic algorithm is O(n^), since we initially search over all clusters of size two. 
For large n, the complexity is 0{n) since we search only over clusters of size one. Table U compares 
upper bounds on the treewidth vs. upper bounds on the block-treewidth for some standard graphs used 
in the hteraturqj. The upper bound on the treewidth is computed using a software package. 

IV. Inference Using Block-Tree Graphs 

In this Section, we outline an algorithm for inference in undirected graphical models using block-tree 
graphs. The algorithm is similar to belief propagation with the difference that message passing happens 
between clusters of nodes instead of individual nodes. Let x € M" be a random vector defined on an 
undirected graph G = {V,E). Using Vi as a root cluster, suppose we construct the block-tree graph 
Q = (C, £) using Algorithm [1] where C = [Ci, C2, . . . ,Ci] and 7^. = \Ck\- From ([T]), we know that p(x) 

^The graphs were obtained from the database in people.cs.uu.nl/hansb/treewidthlib/ 
''See www.treewidth.com 



14 



admits the factorization over cliques such that 

P(x) = ^Yi '^c{xc) , (2) 

cec 

where C is the set of cUques. Using the block-tree graph, we can express the factorization of p(x) as 



P(x) = ^ n ^i,j{^c.^xc^) , (3) 



Z 



where the factors ^j j (xc^ , xc^ ) correspond to a product of potential functions taken from the factor- 
ization in Q, where each ipci^c) is mapped to a unique ^j j (^xc^,xCj)- 

As an example, consider a random vector x € M^ defined on the graphical model in Fig. l(c)| The 



block-tree graph is given in Fig. [2(d)] such that Ci = {1}, C2 = {2,3}, C3 = {4,6}, C4 = {7,8}, 
C5 = {5}, and Cg = {9}. Using Fig. |l(c)[ the joint probability distribution can be written as 

p(x) = V'l,2^1,3V'2,4V'3,4V'3,6^4,6V'4,7V'6,7V'6,8V'7,9V'8,9V'8,5 , (4) 

where we simplify 'tpij{xi,Xj) as ipij. We can rewrite (01) in terms of the block-tree graph as 

p(x) = *i,2(a;ci,a;c2)*2,3(a;c2,2;c3)*3,4(2;c3,2;c4)*4,5(a;c4,a;c5)*4,6(2;c4,a;c6), 

where '^i^2ixCi,XC:,) = V'l,3'01,2, ^2,3 (a^Cs, 2:03) = ^3,6^2,4-04,6, '^2a{xc,,XcJ = 1p6,8tp4,7tp6,7, 

^4,5 (xc4 , xcs ) = ip8,5, and "^4,e{xc4,xce) = V'7,9V'8,9- Since the block-tree graph is a tree decomposition, 
all algorithms valid for tree-structured graphs can be directly apphed to block-tree graphs. Thus, we can 
now use the belief propagation algorithm discussed in Section ITl-B I to do inference on block-tree graphs. 
The steps involved are similar, with an additional step to marginalize the joint distributions over each 
cluster: 

1) For any cluster, say Ci, identify its leaves. 

2) Starting from the leaves, pass messages along each edge until we reach the root cluster Ci. 

rrii^j {xc, ) = ^ ^i,j (xc, ,xc,) W rrie^i {xc, ) , (5) 

xc, e&N{i)\j 

where M{i) is the neighboring cluster of Ci and mi-^j{xCj) is the message passed from cluster Ci 
to Cj. 

3) Once all messages have been communicated from the leaves to the root, pass messages from the 
root back to the leaves. 
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4) After the messages reach the leaves, the joint distribution for each cluster is given as 

Pi^a) = n ^j^ii^cj- (6) 

j&J\f{i) 

5) To find the distribution of each node, marginalize p{xc-)- 

We now analyze the complexity of doing inference using block-tree graphs. Assume Xg € Q, where 
s gV such that |i7| = K. From ([5]), to pass a message from Cj to Cj, for each xc S ill*^^', we require 
K''-^^' number of additions. Thus, this step will require if I'^^I+I'^jI number of additions since we need 
to compute mi^j{xCj) for all possible values xc^ takes. Thus, the complexity of inference is given as 
follows: 

Theorem 4 (Complexity of inference using block-tree graphs): For a random vector x € M" defined 
on an undirected graph G = {V, E) with each Xs taking values in Q., the complexity of performing 
inference is exponential in the maximum sum of cluster sizes of adjacent clusters. 

Another way to realize Theorem |4] is to form a junction-tree using the block-tree graph. It is clear that, 
in general, using block-tree graphs for inference is computationally less efficient than using junction- 
trees. However, for complete block-graphs, an example of which is shown in Fig. |2(b)[ we see that 
both the junction-tree and the block-tree graph have the same computational complexity. Thus, complete 
block-graphs are attractive graphical models to use the block-tree graph framework when the goal is to 
do exact inference. In Section |VII[ we illustrate the advantage of using the block-tree graph framework 
on arbitrary graphical models in the context of approximate inference in graphical models. 

V. Boundary Valued Graphs 

In this Section, we specialize our block-tree graph framework to boundary valued graphs, which are 
known as chain graphs in the literature. We are only concerned with boundary valued graphs where we 
have one set of boundary nodes connected in a directed manner to nodes in an undirected graph. The 
motivation for using these particular types of boundary valued graphs is in modeling physical phenomena 
whose underlying statistics are governed by partial differential equations satisfying boundary conditions 
||29l . ||38l . A common example is in texture modeling, where the boundary edges are often assumed to 
satisfy either periodic, Neumann, or Dirichlet boundary conditions ||20ll . If the boundary values are zero, 
the graph will become undirected, and we can then use the block-tree graph framework in Section JII] 

Let G = ({y, dV], Ey U Egy U E^^ be a boundary valued graph, where V~ is a set of nodes, called 
interior nodes, and dV is another set of nodes referred to as boundary nodes. We assume that the nodes 
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(a) Boundary valued graph (b) Boundary valued graph as a 

block-tree graph 

Fig. 3. An example of a boundary valued graph 

in V^ and dV are connected by undirected edges, denoted by the edge sets Ey and Egy, and that 
there exist directed edges between the nodes of V^ and dV. To construct a block-tree graph, we first 
need to chose a cluster Ci. As discussed in Section Hill any choice of clusters can lead to a recursive 
representation; however, it is natural for boundary valued graphs to initiate at the boundary. For this 
reason, we let the root cluster be dV and then use Algorithm [T] to construct a block-tree graph. By 
choosing the boundary values first, we convert a boundary valued problem into an initial valued problem. 
An example of a boundary valued graph and its block-tree graph is shown in Fig. |3] We let Ci be the 
boundary nodes {a, b, c, d} and subsequently construct C2 = {1, 3, 7, 9}, C3 = {2, 4, 6, 8}, and C4 = {5} 
so that we get the chain structured graph in Fig. [Hb). The probability distribution of x defined on this 
graph can be written as 

p(x) = P(a)P(6)P(c)P((i)V'afecd^i,a^3,6V'7,cV'9,d^i:9 , where, (7) 

V'l:9 = V'12V'14V'2,5V'2,3V'3,6V'5,6^4,5V'6,9V'8,9V'5,8V'7,8^4,7 , 

'^abcd = XI V'l,aV'3,feV'7,cV'9,dV'l:9 ■ (8) 

1:9 

Using the block-tree graph, we write the probability distribution as 

p(x) = ^i,2(2;vi,xi/J^2,3(a;y2>a;i/3)^3,4(2;v3,2;v3) , where, (9) 

^l,2ixv^,Xv,) = P{a)Pib)P{c)P{d)7Pabcdi^l,ai^3,bi^7,ci^9,d (10) 

'^2,3{XV2,XV,) = V'12V'1,4V'3,2V'3,6^7,4V'7,8V'6,9V'8,9 (H) 

^^SAi^V.iXV:,) = V'2,5V'4,5V'6,5V'8,5 (12) 

Notice that ^ did not require the calculation of a normalization constant Z. This calculation is hidden 
in the potential function ipabcd given by ([8]l. We note that the results presented here extend our results 
for deriving recursive representations for Gaussian lattice models with boundary conditions in |[30l to 
arbitrary (non-Gaussian) undirected graphs with boundary values. 
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(a) Scales 



(b) Block-tree 



Fig. 4. Block-tree graph and related shift operators. 

VI. Gaussian Graphical Models 

In this Section, we specialize our block-tree graph framework to Gaussian graphical models. Section 
IVI-AI reviews Gaussian graphical models and introduces relevant notations. Section IVI-BI derives linear 
state-space representations for undirected Gaussian graphical models. Using these state-space represen- 
tations, we derive recursive estimators in Section IVI-CI 

A. Preliminaries 

Let X € M" be a Gaussian random vector defined on a graph G = {V, E), V = {1,2, . . . , n}, anqj 
Xfc € M . Without loss in generality, we assume that x has zero mean and co variance S. From ID, it is 
well known that the inverse of the covariance is sparse and the nonzero patterns in J = S^^ determine 
the edges of the graph G. In the literature, J is often referred to as the information matrix or the potential 
matrix. 

Suppose we construct a block-tree graph G = {C, £) from the undirected graph G = {V, E). Let P be 
a permutation matrix which maps V to C = {Ci, G2, • • • , Q}, i.e., 

x{C) = Px{V) = Px. 

The covariance of x{C) is 



E[x{C)x{Cf] = P^P^ = [^p{i,j)]i 



EixiCMCjV] = Sp(i,i) , for i,j = l,.. .,1 



(13) 
(14) 



^For simplicity, we assume x^ G R, however, our results can be easily generalized when Xk G R'*, d > 2. 
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where the notation F = [F{i,j)]i refers to the blocks of the matrix F for i, j = 1, . . . , I. We now define 
some notation that will be useful in deriving the state-space representation and the recursive estimators. 
The notation is borrowed from standard notation used for tree-structured graphs in Q, (51, (391. For the 
block-tree graph, we have defined Ci as the root cluster. The other clusters of the block-tree graph can 
be partially ordered according to their scale, which is the distance of a cluster to the root cluster Ci. 
This distance is defined as the number of edges in the path connecting two clusters. Since we have a 
tree graph over the clusters, a path connecting two clusters is unique. Thus, the scale of Ci is zero. All 
the neighbors of Ci will have scale one. For any C^ at scale s, define {Cri(A,), • • • ;C'r,(A;)} ^s the set 
of clusters connected to Ck at scale s + 1 and Cx(a,) as the cluster connected to Ck at scale s — 1. Let 
T{Ck) be all the clusters including Ck at scale greater than s and at a distance greater than zero: 

T{Ck) = {C^ : d{Ck, Ci) > and scale(Q) > s} , (15) 

where scale(Cj) is the scale of the cluster Cj. Fig. IJ^a) shows the relevant notations on a block-tree 
graph. Fig. Ub) shows an example of a block-tree graph where the root cluster is Ci = {7}, the cluster 
at scale one is C2 = {7, 8}, the clusters at scale two are C3 = {1} and C4 = {5, 9}, the cluster at scale 
three is C5 = {2,6}, and the cluster at scale four is Cq = {3}. For C2, we have that Cri(2) = C3, 
Cy^/2) = C4, and Cx(2) = Ci- In the next Section, we derive a state-space representation for Gaussian 
random vectors defined on undirected graphs using block-tree graphs. 

B. State-Space Representation 

Given a block-tree graph, we now derive state-space representations on the tree structure. There are 
two types of representations we can define, one in which we are given x (C^) at scale s, and we want 
to compute x (C-f(fc)) at scale s — 1, and another in which we are given x (Cx(fc)) at scale s — 1, and 
we want to compute x (Ck) at scale s. 

Theorem 5: Let x = {x^ G M : A; € V} be a random vector defined over a graph C = {V, E) and 
let Q = {€,£) be a block-tree graph obtained using a root cluster Ci. We have the following linear 
representations: 

x{Ck) = Akx{Crik)) + n{Ck) (16) 

x{Crik)) = Fkx{Ck) + w{Ck) , (17) 

where u{Ck) is a white Gaussian noise uncorrected with x (Cx(fc))> w{Ck) is non-white Gaussian noise 
uncorrelated with x{Ck), and 
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Ak = Sp(Cfc,CT(fc)) [Sp(Cx{A:),C'T(fc))] (18) 

Ql = E [u{Ck)u^{Ck)] = ^p{Ck,Ck) - Ak^p{Crik),Ck) (19) 

Fk = Sp(CT(fc), Cfe) [^p{Ck, CkT' (20) 

QJ^ = E [w{Ck)w''{Ck)] = 5]p(CT(,.),Cx(fc)) - FfcSp(Cfc,CT(fc)) , (21) 

where Ep(i, j) is a block from the covariance of x{C) defined in (fT4l ). 

Proof: We first derive ( fT6l ). Consider the cluster of nodes C\T{Ck)- From the global Markov property, 
stated in Definition [H we have 

x{Ck) = E [x{Ck)\{x{Ci) : Ci G C\r(Cfc)}] = E [x{Ck)\x{C^f^u))] ■ (22) 

= Sp(Cfc,CT(fc)) [i;p(CT(fc),CT(fc))] a;(Cx(A:)) = ^jfcx(Cx(A:)) , (23) 



where we get (1231 ) using the Gauss-Markov theorem BOl for computing the conditional mean. Define 
the error u{Ck) as 

u{Ck) = x{Ck) - x{Ck) = x{Ck) - ^fcx(Cx(fc)) . (24) 

It is clear that u{Ck) is Gaussian. Further, by the orthogonality properties of the minimum-mean squared 
error (mmse) estimates, u{Ck) is white. The variance of u{Ck) is computed as follows: 

E [uiCk)u^iCk)] = E [(x(Cfc) - xiCk))ixiCk) - x(C,))^] (25) 

= E [{xiCk) - x(Cfe))x^(Cfc)] (26) 

= ^p{Ck, Ck) — AfcSp(Cx(fc), Cfc) . (27) 

To go from (1251 ) to (l26l) . we use the orthogonality of u{Ck)- This gives us the Q"^ in ( fT9l) . Equation 
(fTTl) can be either derived in a similar manner or alternatively by using the results of BTI on backwards 
Markovian models. ■ 

The driving noise in (fTTl) . w{Ck), is not white noise. This happens because for each Cx(k)^ there 
can be more than one cluster such that Cxq) = Cy^ky Using the state-space representations, we can 
easily recover standard recursive algorithms, an example of which is shown in the next Section where 
we consider the problem of estimation over Gaussian graphical models. 
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C. Recursive Estimation 

Let X G R" be a zero mean Gaussian random vector defined on an undirected graph G = {V, E). Let 
S be the covariance of x and let J = S^^. Suppose we collect noisy observations of x such that 

Vs = HgXs + Us , (28) 

where Ug ~ J\f{0,Rs) is white Gaussian noise independent of Xg and Hg is known. Given y = 
[yi) • • • ) Vn]'^, we want to find the minimum-mean squared error (mmse) estimate of x, which is £'[x|y]. 
From the Gauss-Markov theorem B0]| . we have 

X = E[^\y] = E[^y^] (i?[yy^]) "' y (29) 

= ^ {HJ:h'^ + Ry\ , (30) 

where H and R are diagonal matrices with diagonals Hg and Rg, respectively. Using ( [30b to compute 
X requires inversion of a n x n matrix, which has complexity 0{n^). An alternate method is to use 
the state-space representations in Theorem [5] and derive standard Kalman filters and recursive smoothers 
using pT|, in IS This approach will require inversion of a btw(G) x btw(G) matrix, where btw(G) is 
the treewidth of the graph. Another approach to computing x is to use the equations in p^l], where 
they derive estimation equations for Gaussian tree distributions given J = S^^. The generalization 
to block-tree graphs is trivial and will only involve identifying appropriate blocks from the inverse of 
the covariance matrix. Thus, by converting an arbitrary graph into a block-tree graph, we are able to 
recover algorithms for recursive estimation of Gaussian graphical models. For graphs with high block- 
treewidth, however, computing mmse estimates is computationally intractable. For this reason, we propose 
an efficient approximate estimation algorithm in the next Section. 

VIL Approximate Estimation 

In this Section, we use the block-tree graph framework to derive approximate estimation algorithms 
for Gaussian graphical models. The need for approximate estimation arises because estimation/inference 
in graphical models is computationally intractable for graphical models with large treewidth or large 
block-treewidth. The approach we use for approximate estimation is based on decomposing the original 
graphical model into computationally tractable subgraphs and using the subgraphs for estimation, see liSll . 
1341 . 1351 for a general class of algorithms. Traditional approaches to finding subgraphs involve using 



*The results in |4| are on dyadic trees, however they can be easily generalized to arbitrary trees. 
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spanning trees, which are tree-structured subgraphs. We propose to use spanning block-trees, which 
are block-tree graphs with low block-treewidth. Section IVII-AI outlines a heuristic algorithm for finding 
maximum weight spanning block-trees. We review the matrix splitting approach to approximate estimation 
in Section IVII-B I and show how spanning block-trees can be used instead of spanning trees. Section IVII-CI 
shows experimental results. 

A. Spanning Block-Trees 

Let G = {V, E) be an undirected graph. We define a i?-width spanning block-tree as a subgraph of G 
with block-treewidth at most B.\i B = 1, the spanning block-tree becomes a spanning tree. For B > 1, 
we want to remove edges from the graph G until we get a block-tree graph with block-width less than 
or equal to B. To quantify each edge, we associate a weight wij for each (z,j) G E. If Wij is the 
mutual information between nodes i and j, finding an optimal spanning block-tree by removing edges 
reduces to minimizing the Bethe free energy P3l . fl4l . For the purpose of approximate estimation in 
Gaussian graphs, the authors in P31 proposed weights that provided a measure of error-reduction capacity 
of each edge in the graph. We use these weights when finding spanning block-trees for the purpose of 
approximate estimation in Section IVII-B I If a graph is not weighted, we can assign all the weights to 
be one. In this case, finding a maximum weight i?-width spanning block-tree is equivalent to finding a 
i?-width spanning block-tree which retains the most number of edges in the final subgraph. 

Algorithm 2 Constructing Maximum Weight Spanning Block-Trees 



procedure MWSpanningBlockTree(G, W, B) 
g = FindOptimalBlockTree(G) ; G = (C, 8) 
C^ ^ SplitCluster(g, VF); See Section NTTMl 
g^ ^ MWSt('{C[^}, w); See Section lylTAll 

end procedure 



Algorithm |2] outlines our approach to finding maximum weight spanning block-trees. The input to the 
algorithm is an undirected graph G, a weight matrix W = [wi,j\n, and the desired block-treewidth B. 
The output of the algorithm is a block-tree graph Q^ = {C^,S^), where btw(ty) < B. Algorithm |2] is a 
greedy algorithm for finding spanning block-trees since solving this problem optimally is combinatorially 
complex. We first find the optimal block-tree graph Q for the undirected graph G using the algorithm 
outlined in Section IIII-CI (Line ^. The next steps in the algorithm are: (i) Splitting C into clusters C^ 
(Line|3]l, and (ii) finding edges £^ connecting two clusters so that G^ = {C^,£^) is a block-tree graph 
(Line IHl. 
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\\^iy 0(S)0 



1 1 



(c) 



(d) 



Fig. 5. (a) Notation used in Section IVII- A II (b) An example showing how clusters are split into smaller clusters. The goal 
is to split the cluster with red nodes into clusters with maximum size 2 and 3. All edges are assumed to have weight one. 
(c) Weighted graph constructed using rjrs given in ( 1311 ). (d) Smaller clusters formed for B = 2 and B = 3. For B = 2, we 
first choose the middle two nodes since the weight between these two nodes in (c) is maximum. The remaining nodes are 
unconnected in (c), so we assign them to individual clusters. For _B = 3, we again choose the middle two clusters first and then 
add another cluster so that the sum of weights is maximal. 



1) Splitting Clusters: Since the maximum size of the cluster in C is greater than B, we first identify 
all clusters {Ck^ , • • • , Cfc„^ } such that | Ck^ \ > B ior alii = 1, . . . , m. Next, we split each Ck^ into smaller 
clusters so that C^^ = {C^ , C| , . . . , C^ }, where |C^ \ < B ior j = I, . . . ,h. This splitting must be 
done in such a way the the nodes in the same cluster retain edges in the original graph with maximum 
weight. The algorithm we propose for splitting the clusters is as follows: 

a) For each Ck-, let r(Cfc. ) = {Cri(j!c,), C'r2(fci)) • • • ) C'r^(fci)} be all the clusters connected to Ck^ at the 
next scale and let CY[k,) be the cluster connected to Cfc. at the previous scale. We assume that we have 
already split Cr{k,) such that Cr{k,) = {Cxcfc,)' ' ' ' ' ^r{k.)^' where |C^(;,^) \ < B, ior j = 1,. . . ,w. 
The notations introduced are shown in Fig. Oa). 

b) To split the cluster C^., we first identify nodes in Ck^ which can be clustered together. For any 
two distinct nodes r,sG Ck^, if both r and s have edges in any one of the clusters C;^/^ ■, for any 
j = 1, . . . ,h, then nodes r and s can be clustered together. For example, in Fig. Oa), nodes r and r' 
can not be clustered together, whereas nodes r and s can be clustered together. 

c) We now associate weights rjrs between nodes of Ck, that can be clustered together: 



Vr 



Wrs + 



E 

t€Af{r)nM{s)nr{Cki) 



Wrt + Wts 



(31) 



The intuition behind constructing the weights in ( [3T] ) is to cluster nodes together that are connected 
to the same cluster at the next scale or are connected to each other. An example of constructing rjrs 
is shown in Fig [2b) and Fig Oc), where we assume the weights on each edge are one. 
d) Using rjrs we construct a weighted graph on the nodes in Ck^. To construct smaller clusters, we first 
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Fig. 6. Example of finding a maximum weight 2-widtli spanning block-tree using the weighted graph in (a). The clusters of 
the block-tree graph are {a}, {b, c, d}, {e, f, g, h, i}, {j, k, t}, shown in (b). Fig. |6{b) shows the weights rjrs for each cluster. 
Fig. [6jc) shows the smaller clusters obtained by retaining the edges with maximum weight in the weighted graph of (b). Note 
that the graph in (c) is a weighted block-tree graph. The red edges in (c) correspond to the maximum weight spanning tree. 
The final subgraph corresponding to the spanning block-tree in (c) is shown in (d). 




i.' i,' i y ,' ,i 




(c) 



Fig. 7. (a) A 4 X 4 grid graph, (b)-(e) A collection of 2-width spanning block-trees, (f)-(g) A collection of 3-width spanning 
block-trees. 



choose two nodes for which ri^s is maximum. We keep adding nodes to this cluster by choosing 
nodes connected to at least one node in this cluster until the cluster size is B. If no other node is 
connected to the new cluster, we start building another cluster. For example, given the weighted graph 
in Fig. Oc), we construct clusters in Fig. |5ld) with B = 2 and i? = 3. 
After applying Steps (a)-(d) on all Ck^ such that [C^. | > i?, we get the news clusters C^ . 

2) Find block-tree graph from clusters: Given the clusters C^ = {Cj^}, we can find a block-graph 
and associate weights between the clusters in C^ such that 



<• 



{hJ)(^{CfxCf)nE 



(32) 



' Equation (|32l ) corresponds to the sum of all weights connecting two clusters Cf and C^. Given the 
weights in (l32l) . we can easily find a spanning block-tree using the maximum weight spanning tree 
(MWST) algorithms of Prim P6l or Kruskal P7l . Fig. |6] shows an example of using Algorithm 2 to find 
a maximum weight 2-width spanning block-tree. Fig. |7] shows a collection of spanning block-trees for a 
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4x4 grid graph. 

3) Complexity: The complexity of Algorithm 2 depends on the structure of the graph. Assuming the 
optimal block-tree graph is given, the complexity of splitting the clusters is 0([Cfc;P) for each Ck, such 
that |CfeJ > B. This number will be dominated by the cluster with maximum size, thus the complexity 
of splitting clusters is 0((btw(G))^). The complexity of finding the final spanning block-tree depends 
on the graph and the number of edges in the block-graph formed using the clusters C^. In general, 
the complexity of this step decreases as B increases since this results in less number of edges and less 
number of smaller clusters C^ . In practice, finding the optimal block-tree graph is hard, so we use the 
heuristic algorithm outlined in Section IIII-CI 

B. Estimation Via Matrix Splitting 

This Section reviews the matrix splitting approach to approximate estimation in Gaussian graphical 
models. For more details, see ||8l and B31 . Let x € M" be a Gaussian graphical model defined on a 
graph G = {V, E) with covariance S and observations given by (1281 ). The mmse estimate is given in 
(l30l l. An alternate characterization of (l30l l is in the information form ||8l: 

VSt = H^R-^y (33) 

V = {J + H^R-^H) , (34) 

where J = S^^ and P = V^^ is the error covariance matrix. The matrices H and R are assumed to 
be diagonal, so the sparsity of V is the same as the sparsity of J. A family of approximate estimation 
algorithms, which are iterative algorithms, have been proposed in 0, with extensions in fl31 . HSl . The 
idea is to split the matrix V at each iteration k as V = Vs^ — Ks^ , where Sk is a subgraph of G. The 
sparsity of Vs^ corresponds to the sparsity of the subgraph Sk and the diagonals of Vs^ are the same as 
the diagonals V. Using matrix splitting, an iterative algorithm for estimation is given as lEl: 

Vs,x('=) = i^s.x(^-i) + H^R-^y , (35) 

where x^*^) is estimate at step k. If S^ is a subgraph with low treewidth or low block-treewidth, computing 
(|35] ) is computationally tractable. Conditions for convergence of ( [35] ) are not known for general graphical 
models, however for walk-summable graphical models P9l . convergence is guaranteed p5|. To compute 
the error covariance P, we can use the same matrix splitting approach to solve the linear system VP = In, 
where /„ is an n x n identity matrix [81. 
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It is clear that the choice of Sk in (l35l) plays an important role in the convergence of the algorithm. 
The problem of adaptively choosing Sk at each iteration was considered in fl31 . ll50ll . where the authors 
proposed a weight matrix Wu,v^ for {u, v) G E, which signified the error reduction capacity of an edge 
(u, v) in the iterations (1351 ): 

^(fc) = f|/i(fc-i) I + |/j(fc-i) \) l-^^^'^)! (36) 

h^^-^) = H^R'^y - yX(^-i) . (37) 

Thus, at each iteration we want to choose a subgraph Sk such that the sum of all the weights in the graph 
is maximized while Sk is still a tractable subgraph. A popular approach is to use spanning trees, since 
finding an optimal spanning tree is efficient. However, as shown in 115011 . using tractable subgraphs, which 
are not trees, leads to faster convergence. Motivated by the need for algorithms with faster convergence, 
we propose to use spanning block-trees for approximate estimation of Gaussian graphical models. Thus, 
at each iteration we compute a weighted graph using (|36l ) and then use these weights to compute a 
i?-width spanning block-tree using Algorithm |2] In the next Section, we provide experimental results and 
show the improved convergence rates of estimation when using spanning block-tree graphs over spanning 
trees. 

C. Experimental Results 

Let G = {V, E) be an undirected graph and suppose x is a Gaussian graphical model defined on G 
with covariance S and J = S~^. We assume the diagonals of J are unity and let S = I — J. The 
non-zero entries in H correspond to the edges in the graph. To construct Gaussian graphical models, we 
choose the non-zero entries in S uniformly between [—1,1] and rescale S so that p{S) = 0.99, where S 
is the matrix of absolute values of the elements of R and p(-) is the spectral radius. From P9l . p{S) < 1 
ensures that the graphical model is walk-summable, which in turn ensures convergence of the iterative 
approximate estimation algorithm in (|35] ) P31 . In all experiments, we assume that the observations are 
given by ys = Xs + rig, where n^ ~ AA(0, 10). At each iteration, we compute the residual error, defined 



as 



l|h<"'ll' „,u„^„ uM _ rj,W >,(")iT f^r !,{") 



, where h(") = [h^"' , . . . , h^f, for h^ defined in (gT]). 



Fig. [8la) shows results of doing estimation over a randomly generated 50 x 50 grid graph using 
spanning trees (Tree), spanning block-trees with block-width of three (BT-3), and spanning block-trees 
with block-width of five (BT-5). It is clear that using spanning block-trees leads to faster convergence. 
The same results hold for Fig. [Hb) that shows results on doing estimation over a randomly generated 
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(a) Estimating x on a 50 x 50 grid grapli. (b) Estimating x on a 70 x 70 grid grapii. 

Fig. 8. Normalized residual error when doing approximate estimation on a 50 x 50 grid graph and a 70 x 70 grid graph. 
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Fig. 9. Normalized residual error when doing approximate estimation using spanning trees, spanning block-trees with block- 
width of two (BT-2), and spanning block-trees with block-width of three (BT-3) on a 15 x 15 grid graph with two nodes 
connected to all other nodes in the graph. 



70 X 70 grid graph. 

Fig. |9] shows results of doing estimation over a randomly generated 15 x 15 grid graph where two 
nodes are connected to all the nodes in the graph. Such graphs, where a few nodes have very high degree, 
are useful in video surveillance, modeling air traffic routes using hub-and-spoke model, or applications 
using small-world graphs. Fig. |9la) plots the residual error at each iteration for the estimate and Fig. ^h) 
plots the residual error at each iteration for the error covariance. Again, we observe that using spanning 
block-trees leads to faster convergence. The above simulations show that using spanning block-trees for 
approximate estimation is viable and leads to improved convergence speed. 



VIII. Summary 

We introduced block-tree graphs as an alternative to junction-trees for constructing tree-structured 
graphs for arbitrary graphical models. We showed that constructing block-tree graphs is simple and 
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only requires information about a root cluster, which is a small number of nodes. On the other hand, 
constructing junction-trees requires knowledge of almost all nodes in the graph. For graphical models 
with boundary conditions, we showed that the block-tree graph framework leads to natural representations 
where we converted a boundary valued problem into a initial value problem. For Gaussian graphical mod- 
els, the block-tree graph framework leads to state-space representations, using which we can easily recover 
recursive algorithms. Using the block- tree graph framework, we derived an algorithm for approximate 
estimation in Gaussian graphical models. The need for such algorithms arises because the problem of exact 
optimal estimation is computationally intractable for graphs with high treewidth. We proposed the use 
of spanning block-trees to derive approximate estimation algorithms for Gaussian graphical models. We 
showed that the speed of convergence when using spanning block-trees is faster when compared to using 
spanning trees. Further applications of spanning block- trees can be explored when doing approximate 
inference over discrete graphical models, using the results of |[34l . lISTl . 
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